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We show that at second order, ensemble averages of observables and directional averages do not 
commute due to gravitational lensing - observing the same thing in many directions over the sky 
is not the same as taking an ensemble average. In principle this non-commutativity is significant 
for a variety of quantities that we often use as observables and can lead to a bias in parameter 
estimation. We derive the relation between the ensemble average and the directional average of an 
observable, at second order in perturbation theory. We discuss the relevance of these two types 
of averages for making predictions of cosmological observables, focusing on observables related to 
distances and magnitudes. In particular, we show that the ensemble average of the distance in a 
given observed direction is increased by gravitational lensing, whereas the directional average of the 
distance is decreased. For a generic observable, there exists a particular function of the observable 
that is not affected by second-order lensing perturbations. We also show that standard areas have 
an advantage over standard rulers, and we discuss the subtleties involved in averaging in the case 
of supernova observations. 


I. INTRODUCTION 

Cosmological observations have become very precise. Especially for the analysis of the cosmic microwave background 
(CMB) data one has to take into account not only first-order perturbations but also second-order effects like lensing [lr 
0j. For other perturbed quantities like supernova distances and redshifts as a function of observed direction [MO], 
cosmic shear mm and galaxy number counts [IMS], second-order perturbative expressions have recently been 
published and demonstrated to be possibly non-negligible. We need to include these second-order effects if we want 
to compare theory with very precise observations. Their measurement is also an opportunity to test general relativity 
since most of these effects are different in theories which modify gravity. 

In this paper we show that special attention has to be given when comparing a second-order calculation to ob¬ 
servations. In cosmology, since we have only one universe at our disposition, we often replace ensemble averages by 
averages over directions. Here we show that at second order, directional and ensemble average do not commute. This 
means that the ergodic assumption is broken by observation on the observer’s past light-cone: due to gravitational 
lensing, observing the same thing in many directions over the sky is not the same as taking an ensemble average. 

The existence of different kinds of averages has already been discussed by Kibble and Lieu m , for the particular 
case of the magnification /i. They argued that the average over random directions in the sky is not the same as the 
average over a random distribution of sources. They showed that the ‘random-source average’ of the magnification /i 
is exactly given by its background value - a result previously demonstrated by Weinberg [T51 - but that the ‘random- 
direction average’ of the magnification is affected by perturbations. They found that the quantity that is invariant 
under random-direction average is the reciprocal magnification /i _1 . 

Here we extend this distinction to arbitrary observables. We argue that theoretically, we can only calculate ensemble 
averages, whereas observationally we usually average over directions. To compare theory with observation, we need 
therefore to first take the directional average of the observable and then the ensemble average. We give an explicit 
expression for the difference between this procedure and the ensemble average in a single line of sight. Our calculation 
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is valid up to second order in perturbation theory and in the regime of weak lensing 1 . 

We apply our formalism to various observables. In particular we discuss the case of the distance, showing that the 
directional average of the distance (followed by an ensemble average) is smaller than its background value, whereas 
the ensemble average of the distance along a single line of sight is larger than its background value. We also discuss 
the case of isolated standard candles and standard rulers for which it is not so clear which averaging procedure to 
consider, since we do not usually have many sources at a given redshift. We illustrate the different biases in different 
variables and we discuss how to construct observables which minimise the bias from second-order perturbations. 
Finally we apply our formalism to the CMB angular power spectrum. We show that in multipole-space, the average 
over directions is automatically taken before the ensemble average. 

The rest of the paper is organised as follows: in section[Il]we show that ensemble average and directional average do 
not commute. We derive a general relation between the two types of averages, valid at second order in perturbation 
theory. In section m we discuss various examples and in section m we conclude. 


II. AVERAGING AT SECOND ORDER 

We consider a cosmological experiment, i.e. an observation where we detect photons coming in at direction n a from 
a source situated at redshift z. Our observable can be the density of galaxies in a certain direction, the temperature 
of the CMB, the luminosity of a supernova, etc. We usually repeat the measurement over various directions in the 
sky, and we measure the mean of the observable and/or the correlation functions (averaged over all directions at a 
fixed angular separation). 

To compare these measurements with theoretical predictions we have to perform ensemble averages. Cosmological 
perturbations are stochastic fields and only their ensemble average and their variance (or other higher-order correlation 
functions) can be calculated. The usual procedure is to assume that due to stochastic isotropy and the ergodic 
principle, the (measured) directional average of our observable is equal to its ensemble average. Here we show that 
this procedure is correct only at first order in perturbation theory. At second order, ensemble averages and averaging 
over directions are two distinct procedures, which do not commute. 

We consider an arbitrary function of direction f(n 0 ) in the sky. Here n Q is the (lensed) observed direction. In an 
unperturbed universe, / does not depend on directions: f = fo- In a perturbed universe, we expand / around fo, in 
perturbations of order e 


/("«) 


fo 


1 + eSi(n 0 ) + —^2 ( n o) + 0(3) 


(1) 


Taking the expectation value of 0 and assuming Gaussianity (so that the expectation value of third-order pertur¬ 
bations vanishes) we get 


(/(™°)> 


fo 


1 + e(Si(n 0 )) + — (<$ 2 (^ 0 )) + 0(4) 


( 2 ) 


Naively we would set (6i(n 0 )) = 0. However, at second order in perturbation theory this is not correct. When we go 
to second order, we have to take into account that the observed photon direction n 0 is lensed from the unperturbed 
source direction (see Fig. [l]), n = n a + a, where a = eoq + e 2 a 2 /2 + 0(3) is the deflection angle, which we assume to 
be small. As a consequence, the distribution of images is not statistically homogeneous and isotropic, meaning that 
(<5i(n 0 )) 7 ^ 0. More precisely we have 

e(<5i(n c )) = e{5 1 {n - a)) = e(<5i(ra)) - e 2 (oq • V5i(n)) + 0(3) = -e 2 (aq • V<5i(n 0 )) + 0(3). (3) 


In the last equality, we have used the fact that the distribution of the sources is statistically homogeneous and isotropic 
(a consequence of the statistical homogeneity and isotropy of the primordial fluctuations), so that (<5i(n)) = 0. Note 
that in expressions which are already second order, we do not need to distinguish between n and n 0 . 

The expectation value of the second-order expression for / is therefore of the form 


(f( n o)) 


fo 


1 - e 2 ( Ql • V^K)) + ~(6 2 ) + 0(4) 


( 4 ) 


1 We do not discuss here the more difficult problem of how caustics may affect observables m- 
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Figure 1: We measure the value of the observable / at the true comoving position of the source x. This position is observed by 
photons coming in at the direction n D , corresponding to an unperturbed position in the sky x 0 = yn 0 . The deflection vector 
<5x relates x 0 to x and the deflection angle is defined through a. = (Sx/y. 

The second term in Q directly follows from the non-random distribution of the images generated by gravitational 
lensing. This term can be simplified using 


(«i ■ V 8 1 (n 0 )) = (V • (c*i(5i)(n 0 )) - (V • aq(n 0 )i5i) = 2 (mSi) , 


(5) 


where the second equality follows since a total divergence does not contribute to the average and since the first-order 
convergence is given by — 2k,i = V • ai(n Q ) (see [4] for details). With this, the expectation value of / becomes 


(/(»«)) 


fo 


1 — 2e" (ki<Si) + — (82) + 0(4) 


( 6 ) 


Let us now see what happens if instead of calculating the expectation value of /, we first average over observed 
directions, and then take the ensemble average. From (JT|) we have 2 




dflnj(n 0 ) ) = fo 


2 

1 + e ^ 5i(n 0 ) ^ ^ S 2 (n a ) ^ + 0(3) 


The first-order term gives 


8 i(n 0 )^> = (^J dn„ o (5i(n 0 )^ = d- (^J 


dn Q 


dn 


5i(n — a ) 


— ( / dVL r 

47T 


5i(n) — e6i(n)V • ai(n) — eQq • V(5i(n) 


(7) 


( 8 ) 


The second and third terms can be combined into a total derivative. Since n is unperturbed, the directional average 
over n commutes with the ensemble average, and we obtain 


J dn n (Si(n)) - eV • (<5i(n)c*i) 


= 0 . 


(9) 


This result holds also at next order in perturbation theory. It relies only on the fact that the expectation value of 
total derivatives vanishes in a statistically homogeneous and isotropic universe 3 . 

For the second-order term in Q, we may neglect the perturbations of the direction (which contribute at third order 
only) so that 82 (^o) — 82 ( 1 ^)^ and directional and ensemble average of 82 commute 


(fcK) ) = (&)+0(4). 


( 10 ) 


2 Here we assume for simplicity that we average over the whole sky, but the argument holds for any patch of the sky. The boundary terms 
which appear are vectors which disappear after ensemble averaging. 

3 Note that the result does not hold if we enter the strong lensing regime where caustics and multiple images appear, and where the map 
n Q 1 —>• n is not one to one. However, the regions where this happens contribute a negligible area for most purposes. 
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With this we obtain 


(f) = fo 


l + -(6 2 )+0(4) 


= (/(«)) • 


Comparing (11) with ([6]) we see that 

</(»)> =(/) = (f(n 0 )) +2fae 2 </siJi) + 0(4). 


( 11 ) 


( 12 ) 


This equation provides a general relation between the ensemble and the directional average of any observable, valid at 
second order in perturbation theory, in the weak lensing regime. In particular, it shows that ensemble and directional 
averages commute only if the variable under consideration has vanishing correlation with the convergence (or similarly 
with the deflection angle). An important consequence is that an observable whose ensemble average is invariant under 
second-order perturbations (like for example the magnification fi. see [III US]) will automatically receive corrections 
when we take its directional average (and vice-versa). 

In practice, if we measure a directional average, /, the ensemble average which we have to compare our measurements 
with is ( / )• However, if we measure just one realisation f(n a ) in a fixed observed direction n Q , we should compare 
our measurements with (f(n 0 )). 

Before we discuss important examples, let us also note the following: if we observe a power of /, say / p , a Taylor 
expansion up to second order yields 


(f P (n)) = (f p ) = fo 


(f P ( n o)) = fo 


l + ^2> + 


Pip - l)e 


-<*?> + 0 ( 4 ) 


1 — 2 pe 2 (ki<5i) + ^ 7 —(£ 2 ) + 


Pip - 1 )« 


whereas 


-<*?>+ 0 ( 4 ) 


(13) 

(14) 


It follows that, choosing p — 1 = — (^ 2 )/(^ 1 ), we can avoid second-order corrections to ( f p ), while choosing p — 1 = 
(4(ki<5i) — (^ 2 ))/(<5i) we can avoid second-order corrections to ( f p (n 0 )). Hence there is an optimal power of a given 
variable which removes corrections to the mean at second order. 

More generally, for an arbitrary function F(f) we obtain 


r 


(F(f(n ))) = ( F(f) > = F(f 0 ) + y [F"{fo)f( 6 l) + F'(fa)fa(5 2 ) 


0(4), whereas 


WK))> = F{f 0 ) + -[F’Vo)fS{Sl) + Fy 0 )f 0 {{6 2 )-^ 1 5 1 )}\ +0(4). 


(15) 

(16) 


Hence choosing F"(fo) fo / F'(fo) = avoids contributions from second-order perturbations to the mean over 

directions, ^ F(f) while choosing F"(fa)fo/F'(fo) = (4(«i5i) — ( 82 ))/fi) avoids second-order contributions to 

WW)). 


III. APPLICATION TO SPECIFIC OBSERVABLES 

We consider various examples and show explicitly how the two averaging procedures lead to different contributions 
from second-order perturbations. 


A. Distances, Angular Sizes and Standard Areas 

The angular diameter distance up to second-order has been fully calculated in |M|. Here we are interested in the 
dominant second-order terms: we want to know what is the maximum impact that second-order contributions can 
have on the mean distance. As discussed in [4], this means that we need to take into account only the terms with the 
maximal number of transverse derivatives. We can neglect contributions proportional to the gravitational potential 
and its time and radial derivatives, relative to transverse derivative of the gravitational potential: 

+ , d t v, dr + < da + , 


where d a + = e l a d^ 3/ and e a = (eg,e v ) are orthogonal to the photon propagation n Q . 


( 17 ) 
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We calculate the angular diameter distance up to second order with these simplifications. Here we just present the 
result, the detailed calculation is given in [4|. The angular diameter distance is given by the determinant of the 2x2 
magnification matrix T> a i ,, defined as 4 


_ _ A / 1 — k — 'yf 1 ) — y( 2 ) — ui \ 

ab a0ib l + z\ —'y^+ui 1 — k + j^J 


(18) 


where u is a curl component which vanishes at first order, — 2 k = V • a is the convergence and 7 ^ and 7 ^ 
are the shear components (from now on, we absorb the smallness parameter e in the perturbation variables). The 
magnification matrix obeys a second-order differential equation, which can be solved order by order in perturbation 
theory. The angular diameter distance is given by 


d 


2 __ 
A — 


(1 + z ) 2 


1 — 2 k + K 2 



(19) 


where |y | 2 = ( 7W) 2 + ( V 2 -*) 2 and we have neglected oj 2 which is of order 4 in perturbation theory. We have also set 
A = x = Vo — Vi i- e - we neglect the perturbations of the affine parameter which contain less transverse derivatives than 
the deflection angle; 770 and 77 denote respectively the conformal time today and at the source position. Expanding k 
to second order, k = K\ + « 2 /2, and taking the square root of (19), 


d A = d 0 


1 + (5i + -^2 + 0(3) 


— dn 


1 — K i — 2 K i ~ 2^ 2 2^ 4 — l7i| 2 ) + 0(3) 


( 20 ) 


where do = x/(l + z) is the background distance (recall that we neglect perturbations in the redshift, which contain 
less transverse derivatives and are therefore subdominant). As |y| enters squared, we need it only to first order, 
| 7 | = |7i|+0(2). 

We are interested in the average of dA- As demonstrated in [3], the second-order convergence k 2 can be written as 
a total divergence, so it does not contribute: both the ensemble average and the directional average of k 2 vanish: 

<K 2 > = (K 2 >= 0 . ( 21 ) 

Furthermore, as we show in [4], the combination k\ — \h \ 2 is also a total divergence, so that 


<«M7i| 2 } = («M7i| 2 )=0. (22) 

If we take the ensemble average of the directional average, we obtain therefore 

(d A ) = ( d A (n )) = d 0 ^1 - ^ (k 2 )^ . (23) 


This shows that lensing decreases the directional average of the distance with respect to the background value do ■ 

On the other hand, if we calculate the ensemble average of the distance, we have a remaining second-order term 
given by ©■ Combining this second-order term with the first-order square, we obtain 


( d A {n 0 )) = d 0 



(24) 


This is the result discussed in [3]. It shows that lensing increases the ensemble average of the distance with respect 
to the background value do- 

The difference between (231 and (241 can be interpreted in the following way. If we consider many realisations of a 
random but fixed line of sight, the fact that there are structures between the source and the observer will, on average, 
increase the distance. This means that structures generate with higher probability a de-focusing on a random line of 
sight. However, if we average the distance over directions, under-densities which lead to de-focusing are competing 
with over-densities which lead to focusing. Since lensing also changes the distribution of lines of sight within a given 


4 Since photon propagation is conformally invariant, T> can be calculated in a non-expanding universe. For an expanding universe, we 
multiply T> by the scale factor 0 .( 77 ) ( see appendix A of I i)). which we rewrite as 1/(1 + z) since perturbations in the redshift are 
negligible relative to the terms with four transverse derivatives. 
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solid angle, the contribution from under-densities does not exactly cancel the contribution from over-densities, 
average, the latter dominates, leading to a decrease of the distance. 


On 


Taking an arbitrary power of d A to second order and making use of (13) and (14), 
divergence, we obtain 


with S} = —K\ and d 2 = —ftf 


(d p A (n)) = ( d p A ^ = d p 0 
( d p A (n 0 )) = d p 0 


1 + p(p_J) ^ ? > + 0(4) 


1 + 


2 

Pip + 2 ) 

2 


whereas 


<k?) + 0(4) 


(25) 

(26) 


This shows that the ensemble average of <f A experiences no second-order corrections for p = —2, while the ensemble 
average of the mean over directions of d p A experiences no second-order corrections for p = +2. These results are 
completely consistent with El, noting that p = d^/detT> ab = (d 0 /d A ) 2 . The background distance is then given by 


do — 


X 


1 + z 


p + 2 


d(; )- 


p -2 


d P A (n 0 ) ) 


1 i/p 


(27) 


4 \ / 4 

From these results we can also calculate the observed angular sizes of standard objects on the sky determined via 
solid angle (Af2(n 0 )) = (d^ 2 (n 0 )) AS = Afl 0 , (28) 

while (AH) = (d A 2 (n))AS = Afi 0 (l + 4(« 2 )) . (29) 

linear angle (A 0(n o )) = (d A 1 (n 0 )) A L = Ad 0 ^1 — ^(«?)^ , (30) 

while (Ad) = (d A l (n))AL = A9 0 M + ^(ft 2 ) J ■ (31) 


Here AS and A L are the standard area and standard length, and we have used (11), i.e., ( / ) = (/(n)). Only one 
of these is invariant: the expectation value of the angular size of a ‘standard area’. Any standard ruler , whether 
averaged over directions or not, will receive corrections. 

The observables discussed here are all affected by the square of the convergence (ft 2 ). The main contribution to 
(ftf) can be calculated using the Limber approximation: 


(ft 2 ) = 


/ d x'— t ' 

/ 0 XX 


^An'F(x) 


- 4 




£(£ + 1 ) 

2 £ + l 


dX 


/ ( x - x’f 
x'x 2 


g 2 ( X ')(PoT 2 


k = 


£+ 1/2 


X 


(32) 


where Aq is the angular Laplacian, T is the transfer function, Pq is the power spectrum of the primordial gravitational 
potential and g is the growth factor (see [3] for more detail). Figure [ 2 ] shows (nf) as a function of redshift for different 
values of cosmological parameters. At the last scattering surface, z ~ 1100, (nf) reaches 0.6 percent. 

In the discussion above, we have considered only the perturbations with four transverse derivatives of the gravi¬ 
tational potential. A crucial consequence is that in this case, the second-order convergence «2 can be written as a 
total divergence, which vanishes on average. The only remaining contribution is therefore the square of the first-order 
convergence (ft 2 ). However, the full relativistic expression for the distance contains also second-order terms with two 
transverse derivatives of the gravitational potential and second-order terms with no transverse derivatives [ 6 ®]. 

The terms with no transverse derivatives are for example due to the integrated Sachs-Wolfe or the Shapiro time 
delay. These terms change the physical length of the photon geodesic between the source and the observer. They 
do not vanish on average and they affect therefore the mean distance to the source. Their amplitude is however of 
the order of the square of the gravitational potential, T 2 ~ 10 ~ 10 , i.e. much smaller than the first-order convergence 
square in (23) and (24). In addition to the ISW and the Shapiro time delay, the distance receives also corrections 
from the Doppler terms. The square of the velocity is of the order 10 ~ 6 and the Doppler terms are therefore relevant 
only at very low redshift 2 < 0.5. 

The terms with two transverse derivatives describe a coupling between the longitudinal and transverse deflections. 
For example, some of these terms are due to the fact that we average the distance over directions n 0 at a fixed value 
of the redshift. Since the redshift is itself perturbed, z — Zg + 8z , we obtain contributions proportional to 


d A {z 0 + Sz, n a ) ~ d' A (z 0 , n a ) ■ Sz ~ d 0 fti 


8z\ 

1 + z ’ 


(33) 
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Figure 2: The main contribution to (k\) comes from terms with 4 transverse derivatives, shown here for Q m = 0.3, h = 0.7 
and n a = 0.96. It is extremely sensitive to small-scale structure, illustrated here by considering different baryon fractions 
fb = = 0, 0.15, 0.3. Higher fb damps small-scale power below 100 Mpc through Silk damping, resulting in a smaller 

(ki). At z ~ 1100 an approximate dependence on model parameters is (k\) ss 0.42 h 2 ' 5 ^ b+3 n a e~ 12 ' 6Clbh . Note that 

here we have used the linear power spectrum to calculate (k1). At high redshift, this is a good approximation, but at low 
redshift the linear expression underestimates the effect. 


i.e. contributions due to the fact that the distance is integrated on a perturbed surface, at a fixed redshift from the 
observer. As seen from Fig. [2j (k\) reaches 6 x 10 -3 at very large redshift. The terms with two transverse derivatives 
are therefore roughly of the order ki x f vh ~ V0.006 x 10 -5 ~ 8 x 1CT'. It is therefore also justified to neglect these 
types of terms relative to the square of the convergence. 

Finally, let us mention that the distance contains also first-order contributions proportional to the gravitational 
potential at the observer, T'o, and to the peculiar velocity at the observer, v Q • n 0 . Whereas the velocity term vanishes 
under directional average, the potential term remains and contributes at the order of 10 -5 . This contribution is 
almost three orders of magnitude smaller than (nf) at high redshift. Below redshift 1 however, where («:f) is much 
smaller, this contribution may be relevant (see [2D] for a detailed analysis of the impact of this term on supernovae 
measurements). 


B. Fluxes, Magnitudes and Standard Candles 


For a standard(-izable) candle such as a type la supernova, the relevant quantities are not the geometrical variables 
angle and area distance, but rather the flux of photons at the observer and the luminosity distance. Observers typically 
plot the distance modulus n m = m — M to a supernova at redshift z of observed magnitude to and true magnitude 
M, inferred via the observed flux T and defined as follows 5 , 


T(z) 


ii m (z) - 25 


L 


^TTdl(z) ’ 


5 log 


25 Mpc 


(34) 

(35) 


Here z is the redshift of the supernova (we neglect its perturbation which is justified only for z > 0.5, see mum), 
and L is its intrinsic luminosity. Observers usually do not have many supernovae with the same redshift in different 


5 The distance modulus, denoted here by /x m , should not be confused with the lensing magnification fi. 









directions. Therefore, they directly fit the observed curve /i m (z) to the corresponding curve for some background 
cosmology, without taking into account perturbations. First-order perturbations to the distance have been discussed 
and taken into account as a systematic error f!HTEI] . Here we discuss a shift of the mean value due to second-order 
fluctuations. We concentrate on redshifts z > 0.5. The effect on the Hubble constant from close-by supernovae with 
z < 0.1 is discussed in m- 

The luminosity distance is related to the angular diameter distance through cIl(z) = (1 + z) 2 (Ia(z). Neglecting the 
perturbations in the redshift we have 


SdL 

d L 


SdA 1 1 | 1 2 

— = - -17,1 


Inserting this into (34) and (35) and expanding up to second order we obtain 


— T o ^1 + 2ki + «2 + 3k^ + |yi | 2 ^ , 
Hm(z) = fJ,mo(z) - 5«1 - ^ (k 2 + K? + |7l| 2 ) • 


(36) 


(37) 

(38) 


The impact of second-order lensing on supernovae measurements then really depends on how observations are 
performed. If the observed flux of each supernova is directly compared with the intrinsic luminosity to extract the 
luminosity distance, then the relevant quantity is the expectation value of the flux. This is unaffected by second-order 
lensing, since (k 2 ) = 0 and — | 7 i| 2 } = 0: 


{T(n 0 )) = I'q . 


(39) 


On the other hand, if the distance modulus fjL m is used directly to extract the luminosity distance, then second-order 
lensing will systematically increase the expectation value 

(Vm(z, n 0 )) = n m0 (z) + 5 (k\) , (40) 

leading to an overestimate of the distance dL- Considering Fig. [2j we find that for z < 2 the shift in // m is less than 
0.003 and therefore will produce a shift in the dark energy equation of state around the percent level. However, since 
it is a shift with a definite sign leading to a slight overestimation of /i m and hence of supernovae distances, this can 
bias parameter estimation if not taken into account. 

The discussion above assumes that the luminosity distance is extracted individually from each supernova. If the 
sample is large enough, one can instead split the supernovae into N bins of redshift and average the observed flux over 
all supernovae in the same bin. This average can be considered as an approximate angular average, or equivalently 
expectation over true source positions (per redshift bin) for fixed size angular patches: 


1 

N 


N N 

J2^{ z ,n l 0 ) = —^2^i( z , n l)An no ~ (-7 r (-)) = (F(z,n)) =.Fo(z)(i + 4(k?)) 


(41) 


where we used (12). We see that in this case the observed flux is increased by second-order lensing, leading to an 


underestimation of the luminosity distance. Hence even though averaging over supernovae in the same redshift bin 
has the advantage of decreasing the statistically uncertainty in the measurement of the flux, it has the disadvantage of 
introducing a systematic bias in the luminosity distance. This bias is also present if we average the distance modulus 


^ lim{z)^j = Hmo(z) - 5 («i) 


(42) 


and the distance is also underestimated in this case. From (25) we see that to fully exploit the potential of averaging 


over bins of redshift, without introducing any additional bias, we need to first extract the square of the luminosity 
distance for each supernova in the bin, and then take an average 


d 2 L )=dl 0 . 


(43) 


Finally, let us note that in practice, since supernovae are not perfect standard candles, their intrinsic luminosity 
is calibrated on a training subset, for which both the flux and the distance are known. From this subset a relation 
between the intrinsic (peak) luminosity and the shape of the light-curve (for example its width) is derived [24]. This 
relation is then used to determine the intrinsic luminosity of the other supernovae. One can then wonder whether 
this relation is affected by second-order lensing, leading to a systematic bias in the determination of the intrinsic 
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luminosity and consequently to an error in the distance measurements. This is fortunately not the case. Indeed, in 
the training subset, both the distance and the flux are affected in a consistent way by lensing 6 (following p6| ) and 
©). leaving the intrinsic luminosity unchanged 

L = 4nd 2 L T = 4ttc1 2 L0 T 0 = L 0 . (44) 

The shape of the light-curve is also unaffected, since lensing is not expected to vary during the time-scale of the 
supernova explosion. As a consequence, lensing can induce a constant shift in the amplitude of the light-curve, but it 
cannot generate a change in the shape. Therefore the relation between the shape of the light-curve and the intrinsic 
luminosity inferred from the training subset is not biased by second-order lensing. 


C. CMB angular power spectrum 


As a last example, we discuss what happens with the CMB. In 3], we calculated the contribution of the shear and 
the convergence to the angular power spectrum of the CMB, up to second order in perturbation theory. We found that 
the lensed power spectrum D{£) = £ 2 C(£), calculated for a constant magnification matrix, is related to the unlensed 
power spectrum D{£) through 


m = d(£) 


k + k 2 + i|7| 


£D\£) + \ 


-| 7 | 2 £ 2 D'\t). 


(45) 


If we take a directional average of (451, as we do in [3j, i.e. we average the shear and convergence over different parts 
of the sky, we obtain 


D(£) > = D(i) + \{k\)£D'{£) + k\)£ 2 D '\£). 


(46) 


If instead we take an ensemble average, i.e. we average over all possible realisations of the shear and convergence 
fields, we obtain 


D(£)) = D(£) - \{k 2 )£D'{£) + \{k 2 )£ 2 D"{£) . 


(47) 


The smoothing term £ 2 D"{£) is the same in the two cases: it shows that lensing decreases the amplitude of the peaks. 
The pure displacement term £D'(£) on the other hand is different: it is positive if we take a directional average but 
negative if we take an ensemble average. This is completely equivalent to the effect on the distance described in 
section III A| the ensemble average of the distance is increased whereas the directional average is decreased. Since 
the CMB is averaged over directions, the correct averaging procedure is given by (46), leading to a shift of the peaks 
to lower multipoles 7 . As discussed in detail in [3], this shift is consistently included in standard Boltzmann codes 
(such as CAMB [551 or CLASS [25]), since it is due to the square of the first-order convergence. The only term 
in ( |45| ) neglected by current CMB analyses is the contribution from the second-order convergence K 2 - However, as 
already mentioned before, the dominant terms in k 2 (those with 4 transverse derivatives) vanish on average, and the 
subdominant terms are negligible, roughly 10 4 times smaller than the first-order convergence square (see discussion 
at the end of section III A). 


Finally, let us mention that the standard way of calculating the CMB angular power spectrum is not through (46) 
but rather through a calculation of the lensed multipoles a^ m . This calculation automatically selects the correct 
averaging procedure since the a^ m are defined through a (weighted) average over directions 


— J dH no Y) m (n 0 )T(n 0 ). 


The angular power spectrum is given by 

\ (l f t/ m/ ) = C/ 4/;/;' S rn rn / . 

The ensemble average is therefore automatically taken after the integral over directions. 


(48) 

(49) 


6 The distance to supernovae in the training subset is indeed usually measured through the host galaxy using its surface brightness 
fluctuation or the Tully-Fisher relation to determine the galaxy intrinsic luminosity. Photons emitted by the galaxy follow the same 
path as the supernova’s photons and experience therefore the same lensing corrections. 

7 Note that the shift in the observed position of the peaks is governed both by the displacement ter m an d by the smoothing term. Since 

the smoothing term dominates over the displacement term, the shift is also to lower multipoles in but its amplitude is smaller. 
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IV. CONCLUSIONS 

In this paper we have shown that at second order in perturbation theory, averaging over the observed directions 
and taking an ensemble average are two distinct procedures, which do not commute. This comes from the fact that 
the observed direction is lensed hence it is itself perturbed - and its perturbation may well be correlated with the 
observable under consideration. 

This is especially relevant for distance measurements, the perturbations of which are intimately related to lensing. 
In particular, we have shown that the distance to the last scattering surface is decreased by lensing if one takes an 
average over directions, whereas it is increased if one takes an ensemble average. In a companion paper 0] we argue 
that the directional average is relevant for CMB observations and that consequently second-order lensing shifts the 
position of the peaks to lower multipoles. However, as we showed there, the change in the distance captures the 
essence of the change to the peaks but does not accurately capture the shift or damping of the peaks. Consequently, 
the CMB sound horizon does not act as a standard ruler when lensing is present. 

For the analysis of diffuse observables like the CMB (actually also the BAO’s), correlation functions are calculated 
and fitted to the model. Moreover, in multipole space, the average over directions is always taken before the ensemble 
average, and no ambiguity arises concerning the averaging procedure. For standard rulers and candles which are point 
sources, it is much more subtle to remove the bias arising from lensing in going from the observable to the model, 
and the distance measure chosen must be carefully understood. In particular, the notion of standard ruler should be 
extended to a standard area as the expectation value of an observed solid angle is preserved under lensing, which is 
not the case for an observed linear angle. 

We have also shown that for specific observables there exist functions which do not acquire corrections from second- 
order perturbations. It is therefore a good observational strategy to consider these functions of the observables in 
order to avoid systematic errors from second-order perturbations. 


Note added: While this paper was being finalised, an independent paper on the same topic appeared |27]. 
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